Swap encryption pr1 shared infra#6776
Conversation
…il.IssueCommand Replace all raw ['gcloud', ...] list + vm_util.IssueCommand calls in swap_encryption_benchmark.py with PKB's existing GcloudCommand infrastructure: - _create_benchmark_node_pool: cluster._GcloudCommand() + cmd.flags + cmd.Issue - _delete_default_node_pool: cluster._GcloudCommand() + cmd.Issue - _attach_swap_disk: gcp_util.GcloudCommand(_GcpZonalResource) for create+attach - _delete_disk_by_name: gcp_util.GcloudCommand for describe/detach/delete Add _GcpZonalResource shim: pins zone for gcloud compute operations. GcloudCommand auto-injects --project and --zone/--region, handles auth token refresh -- matching PKB standards.
…fix imports
Replace manual temp-file + kubectl apply in _deploy_daemonset() with
PKB's kubernetes_commands.ApplyManifest():
- Remove _daemonset_yaml() helper
- _deploy_daemonset() delegates to kubernetes_commands.ApplyManifest(
'cluster/swap_encryption_daemonset.yaml.j2', **kwargs)
- Add kubernetes_commands import; remove vm_util import (now unused)
- Fix import order: providers.gcp before resources.container_service
… remove cgroup hack Address Ajay review comments on PR GoogleCloudPlatform#6776: Comment #r3457877984 (linuxConfig.swapConfig): Extend --system-config-from-file YAML with linuxConfig blocks: linuxConfig.swapConfig.enabled: true -- GKE sets up node-level swap dedicatedLocalSsdProfile.diskCount: N -- LSSD: use local NVMe for swap linuxConfig.sysctl: vm.swappiness=100, vm.min_free_kbytes=200, vm.watermark_scale_factor=500 Ref: https://cloud.google.com/kubernetes-engine/docs/how-to/node-memory-swap Comment #r3457928855 (cgroup hack): Remove memory.swap.max=max loop from swap_encryption_daemonset.yaml.j2. With kubeletConfig.memorySwapBehavior=LimitedSwap the kubelet manages per-container swap allocation; the cgroup hack is unnecessary.
…5); manifest moved to data/cluster and rendered via vm_util
|
/cc @ajaysundark |
Per Ajay's review comment on PR GoogleCloudPlatform#6758: - Add _GKE_KUBELET_MEMORY_SWAP flag (default LimitedSwap) so the benchmark nodepool is created with kubeletConfig.memorySwapBehavior set via --system-config-from-file, enabling pod-level swap usage. - Wrap gcloud IssueCommand in try/finally to clean up the temp YAML. - Update nodepool creation log to include kubelet_swap value.
| make install 2>&1) > /tmp/pkb_memtier_build.log 2>&1 || \\ | ||
| echo "[pkb] WARNING: memtier_benchmark build failed (see /tmp/pkb_memtier_build.log); redis-benchmark fallback will be used" | ||
| fi | ||
| if command -v memtier_benchmark >/dev/null 2>&1; then |
There was a problem hiding this comment.
This PR calls out this change as only the infra layer. Is this still required to hard-code memtier / other setup here?
The model we discussed with "swap as a base resource" is that by passing a swap config to the vm, we could plug this into other kubernetes benchmarks.
eg:
There was a problem hiding this comment.
Hi @ajaysundark , The DaemonSet was stripped to only measurement tools (fio, cryptsetup, mdadm, sysstat, nvme-cli). Workload benchmarks (redis, memtier, opensearch) run in separate PKB pods. Fix committed in the lean DaemonSet PR1 commit and in PR4's DaemonSet slim-down.
| echo "[pkb] WARNING: kernel source extraction failed" >&2 | ||
| fi | ||
| echo "[pkb] Unlocking container cgroup swap limits..." | ||
| # GKE cgroup v2 sets memory.swap.max=0 per-container, which |
There was a problem hiding this comment.
Kubernetes sets memory.swap.max=0 unless swap config is enabled.
For GKE, this would be already handled if you use --system-config-from-file to handle the setup.
linuxConfig:
swapConfig:
enabled: true
dedicatedLocalSsdProfile:
diskCount: 1
sysctl:
vm.min_free_kbytes: 200
vm.watermark_scale_factor: 500
vm.swappiness: 100
More reference: https://docs.cloud.google.com/kubernetes-engine/docs/how-to/node-memory-swap#enable
There was a problem hiding this comment.
If swap provisioning is handled via DaemonSet (for EKS), it needs to also handle this step: https://kubernetes.io/docs/tutorials/cluster-management/provision-swap-memory/#set-up-kubelet-configuration.
To add kubelet configuratoin and perform a kubelet service restart at the end to pickup swap configuration
There was a problem hiding this comment.
alternatively, if a nodeadm script is used instead of DaemonSet in EKS, the restart will not be needed as the setup is handled before kubelet start.
There was a problem hiding this comment.
Kubernetes sets
memory.swap.max=0unless swap config is enabled.For GKE, this would be already handled if you use --system-config-from-file to handle the setup.
linuxConfig: swapConfig: enabled: true dedicatedLocalSsdProfile: diskCount: 1 sysctl: vm.min_free_kbytes: 200 vm.watermark_scale_factor: 500 vm.swappiness: 100More reference: https://docs.cloud.google.com/kubernetes-engine/docs/how-to/node-memory-swap#enable
Hi @ajaysundark ,
Fixed, The --system-config-from-file YAML now includes the full linuxConfig block alongside kubeletConfig.memorySwapBehavior:
yamlkubeletConfig:
memorySwapBehavior: LimitedSwap
linuxConfig:
swapConfig:
enabled: true
dedicatedLocalSsdProfile:
diskCount: 1 # LSSD path; omitted for standard/hyperdisk
sysctl:
vm.min_free_kbytes: 200
vm.watermark_scale_factor: 500
vm.swappiness: 100
For non-LSSD machine types the dedicatedLocalSsdProfile block is omitted. The _LSSD_COUNT flag controls diskCount when LSDs are present.
There was a problem hiding this comment.
alternatively, if a nodeadm script is used instead of DaemonSet in EKS, the restart will not be needed as the setup is handled before kubelet start.
Hi @ajaysundark , ACKNOWLEDGED in stub. The _configure_eks_kubelet_swap() docstring explicitly documents the nodeadm bootstrap approach (apiVersion: node.eks.aws/v1alpha1 / Kind: NodeConfig / spec.kubelet.config.memorySwapBehavior: LimitedSwap) as the intended implementation path. Deferred to PR #6780.
There was a problem hiding this comment.
If swap provisioning is handled via DaemonSet (for EKS), it needs to also handle this step: https://kubernetes.io/docs/tutorials/cluster-management/provision-swap-memory/#set-up-kubelet-configuration.
To add kubelet configuratoin and perform a kubelet service restart at the end to pickup swap configuration
Hi @ajaysundark , _configure_eks_kubelet_swap() is present in all branches as a documented stub, explaining the nodeadm approach and the PR #6780 dependency. The actual kubelet config write + restart is not implemented — it is explicitly deferred pending PR #6780 (SwapConfigSpec) merging.
| fi | ||
| echo "[pkb] Unlocking container cgroup swap limits..." | ||
| # GKE cgroup v2 sets memory.swap.max=0 per-container, which | ||
| # prevents swap usage even when the node has a swap device and |
There was a problem hiding this comment.
This hacking memory.swap.max=max across kubepods in the manifest is not required when swap kubelet configuration is set as LimitedSwap, as kubelet will allocate swap limits to the container.
You could still set sysctls as shared in the above comment to have a longer swapping window.
There was a problem hiding this comment.
Hi @ajaysundark ,
Fixed. The memory.swap.max=max loop across kubepods has been removed from the DaemonSet manifest. With kubeletConfig.memorySwapBehavior=LimitedSwap set via --system-config-from-file, the kubelet manages per-container swap allocation directly — the manual cgroup write is not needed and was superseded by the kubelet config.
| 'swap_encryption_gke_kubelet_memory_swap', | ||
| 'LimitedSwap', | ||
| 'Value for kubeletConfig.memorySwapBehavior injected via ' | ||
| '--system-config-from-file when creating the GKE benchmark nodepool. ' |
There was a problem hiding this comment.
This correctly calls out the config, but I think probably a miss that the script still calls the daemonset for GKE as well ( _deploy_daemonset at :469)
There was a problem hiding this comment.
Hi @ajaysundark , The DaemonSet is still deployed on GKE. In the current code, it serves as the fio execution vehicle for Phase 1 measurements (all phases run via kubectl exec into the pod) — it's no longer used for swap setup on GKE (that's done natively via linuxConfig.swapConfig)
2b0b687 to
2c7a677
Compare
| # Per Ajay's review (go/pkb-swap-encryption-pr1 #r3457877984): | ||
| # kubeletConfig.memorySwapBehavior: kubelet allocates swap to pods. |
There was a problem hiding this comment.
I dont remember suggesting explicit kubelet configuration needed for
memorySwapBehavior:LimitedSwap in GKE.
This will be handled by the swapConfig automatically. you only need to pass
the swapConfig and sysctl in system-config yaml file.
There was a problem hiding this comment.
Hi @ajaysundark , Done — removed kubeletConfig.memorySwapBehavior from the system-config YAML. The --system-config-from-file now only passes linuxConfig.swapConfig + linuxConfig.sysctl, which is sufficient since GKE automatically enables memorySwapBehavior=LimitedSwap when swapConfig.enabled=true. The _GKE_KUBELET_MEMORY_SWAP flag has also been removed. The YAML passed at nodepool creation is now: ```
linuxConfig:
swapConfig:
enabled: true # (+ dedicatedLocalSsdProfile for LSSD)
sysctl:
vm.min_free_kbytes: 200
vm.watermark_scale_factor: 500
vm.swappiness: 100
| GKE vs. EKS swap encryption and LSSD performance comparison. | ||
| Two-step nodepool setup: PKB provisions a minimal cluster with a cheap | ||
| default nodepool (Step 1), then Prepare() adds the real benchmark | ||
| nodepool (n4-highmem-32 / c4-*-lssd, COS_CONTAINERD, 80k IOPS) with a |
There was a problem hiding this comment.
Is this and other comment references
on COS_CONTAINERD accurate? Seems like we are working with UBUNTU_CONTAINERD
(I'm fine with picking either for now), Does it need an update?
There was a problem hiding this comment.
Hi @ajaysundark ,
Fixed — updated BENCHMARK_CONFIG description, Prepare() docstring, and _create_benchmark_node_pool() docstring to say UBUNTU_CONTAINERD consistently, matching the actual flag default. The one remaining COS_CONTAINERD mention in the _NODE_IMAGE_TYPE flag description ("Use COS_CONTAINERD only when dm-crypt is disabled") is intentional — it documents a valid but non-default option for plain-swap baseline runs.
bec6422 to
57c4884
Compare
- Add SwapDaemonSet(resource.BaseResource) in resources/container_service/swap_daemonset.py - _Create(): apply Jinja2 manifest + wait for Running + /tmp/pkb_ready - _Delete(): in-pod swapoff/dmsetup/losetup/pkill teardown; kubectl delete - PodExec(): transient-reset retry, rc=137 OOM detection, pod recovery - Add SwapNodePool(resource.BaseResource) in resources/container_service/swap_nodepool.py - _Create(): gcloud node-pools create with linuxConfig.swapConfig + optional swap disk - _Delete(): detach+delete disk; delete nodepool - DeleteDefaultPool(): remove dummy e2-medium pool after DaemonSet pod Running - Rewrite benchmark to thin pattern: Prepare() uses resource.Create() + spec.resources - Cleanup() is empty - PKB framework auto-deletes spec.resources - Run() uses daemonset.PodExec() throughout - Addresses Zac review: resources pattern, no infra code in benchmark file - Fix COS_CONTAINERD -> UBUNTU_CONTAINERD (r3472549985) - swapConfig auto-enables memorySwapBehavior=LimitedSwap (r3472513706)
… NodepoolSpec field BREAKING: replaces SwapNodePool (standalone nodepool lifecycle) with the correct PKB pattern: swap configuration declared in BENCHMARK_CONFIG and applied by the existing GKE cluster creation flow. New files: - resources/container_service/swap_config.py - GkeSwapConfig(BaseResource): WriteLinuxConfigYaml(), ValidHyperdiskThroughput() - EksSwapConfig(BaseResource): stub for nodeadm config (deferred to PR GoogleCloudPlatform#6780) Core framework changes: - configs/container_spec.py: add SwapConfigSpec(BaseSpec) + _SwapConfigDecoder + swap_config field on NodepoolSpec - resources/container_service/container.py: add swap_config attr to BaseNodePoolConfig - resources/container_service/container_cluster.py: propagate swap_config in _InitializeNodePool() (mirrors sandbox_config pattern) - providers/gcp/google_kubernetes_engine.py: _AddNodeParamsToCmd() reads nodepool_config.swap_config - applies --system-config-from-file, UBUNTU_CONTAINERD, --no-enable-autorepair, boot-disk-provisioned-iops/throughput Thin benchmark: - BENCHMARK_CONFIG declares benchmark nodepool with swap_config (no separate nodepool create needed - GKE cluster creation handles it) - Prepare(): deploy SwapDaemonSet + delete default-pool - Run(): verify swap_active + swap_encrypted; report samples - Cleanup(): empty (PKB auto-deletes spec.resources) Addresses Ajay reviews: - r3457826290: swap as base resource plugged into GKE cluster creation flow - r3457877984: linuxConfig.swapConfig via --system-config-from-file (GkeSwapConfig) - r3457928855: removed memory.swap.max hack - r3457964593: UBUNTU_CONTAINERD set per-nodepool in _AddNodeParamsToCmd - r3472513706: swapConfig auto-enables memorySwapBehavior=LimitedSwap - r3472549985: UBUNTU_CONTAINERD required for dm-crypt
… NodepoolSpec field BREAKING: replaces SwapNodePool (standalone nodepool lifecycle) with the correct PKB pattern: swap configuration declared in BENCHMARK_CONFIG and applied by the existing GKE cluster creation flow. New files: - resources/container_service/swap_config.py - GkeSwapConfig(BaseResource): WriteLinuxConfigYaml(), ValidHyperdiskThroughput() - EksSwapConfig(BaseResource): stub for nodeadm config (deferred to PR GoogleCloudPlatform#6780) Core framework changes: - configs/container_spec.py: add SwapConfigSpec(BaseSpec) + _SwapConfigDecoder + swap_config field on NodepoolSpec - resources/container_service/container.py: add swap_config attr to BaseNodePoolConfig - resources/container_service/container_cluster.py: propagate swap_config in _InitializeNodePool() (mirrors sandbox_config pattern) - providers/gcp/google_kubernetes_engine.py: _AddNodeParamsToCmd() reads nodepool_config.swap_config - applies --system-config-from-file, UBUNTU_CONTAINERD, --no-enable-autorepair, boot-disk-provisioned-iops/throughput Thin benchmark: - BENCHMARK_CONFIG declares benchmark nodepool with swap_config (no separate nodepool create needed - GKE cluster creation handles it) - Prepare(): deploy SwapDaemonSet + delete default-pool - Run(): verify swap_active + swap_encrypted; report samples - Cleanup(): empty (PKB auto-deletes spec.resources) Addresses Ajay reviews: - r3457826290: swap as base resource plugged into GKE cluster creation flow - r3457877984: linuxConfig.swapConfig via --system-config-from-file (GkeSwapConfig) - r3457928855: removed memory.swap.max hack - r3457964593: UBUNTU_CONTAINERD set per-nodepool in _AddNodeParamsToCmd - r3472513706: swapConfig auto-enables memorySwapBehavior=LimitedSwap - r3472549985: UBUNTU_CONTAINERD required for dm-crypt
GkeSwapConfig and EksSwapConfig now both inherit from BaseSwapConfig(BaseResource). Common sysctl attrs (swappiness, min_free_kbytes, watermark_scale_factor) live in the base class. Cloud-specific attrs remain in each subclass. Addresses Zac review: GkeSwapConfig & EksSwapConfig should inherit from BaseSwapConfig.
feat(swap_encryption/pr1): shared infra — SwapDaemonSet BaseResource + PKB framework swap config
Stacked PR 1/5. Adds the infrastructure all later PRs build on.
New files:
resources/container_service/swap_daemonset.py—SwapDaemonSet(BaseResource): full DaemonSet lifecycle (_Create,_Delete,WaitForPod,PodExec,RecoverPod). Added tospec.resourcesfor auto-teardown.resources/container_service/swap_config.py—GkeSwapConfig(BaseResource): writeslinuxConfig.swapConfigYAML for--system-config-from-file.EksSwapConfig(BaseResource): nodeadm stub (deferred to PR Add swap_config to node-pool spec #6780).PKB framework changes:
configs/container_spec.py—SwapConfigSpec(BaseSpec)+_SwapConfigDecoder;swap_configfield onNodepoolSpec(mirrorsSandboxSpecpattern)resources/container_service/container.py—swap_configattr onBaseNodePoolConfigresources/container_service/container_cluster.py— propagateswap_configin_InitializeNodePool()providers/gcp/google_kubernetes_engine.py—_AddNodeParamsToCmd()readsnodepool_config.swap_config: applies--system-config-from-file(linuxConfig.swapConfig + sysctls), setsUBUNTU_CONTAINERD,--no-enable-autorepair, boot-disk IOPS/throughputThin benchmark (
swap_encryption_benchmark.py):BENCHMARK_CONFIGdeclaresbenchmarknodepool withswap_config— GKE cluster creation handles provisioning automatically, no separate nodepool management neededPrepare(): deploySwapDaemonSet+_delete_default_pool()Run(): verifyswap_active,swap_encrypted; reportswap_cipher+swap_total_kbCleanup(): empty — PKB auto-deletesspec.resources